Data Viz Overview

Data visualization is an essential part of the statistical analysis process, both for exploratory analyses and summarizing findings.

Exploratory Data Analysis
  • Used for data quality checks

  • Help explore and understand the data

  • Typically, not seen by anyone else

Polished Data Visualization
  • Used to summarize data in presentations or papers

  • Should stand alone with appropriate titles, axes, labels, and captions

Spatial Data Viz Tools

There are many tools for creating spatial figures (GIS software, Tableau, etc…), but we will exclusively use R and the wide range of packages within it.

In particular, we will use:

  • ggplot2

  • ggmap

  • leaflet

  • RgoogleMaps

  • and many others…

Point Data: What is this?

Point Data: How about now?

Point Data: Is this better?

ggmap Code Overview

 mykey <- read_file('./google_api.txt')
  register_google(key = mykey)
  myMap <- get_map(location = c(lon = - 74,lat = 40.75),
                 source = "google",
                 maptype = "roadmap", crop = FALSE,
                 zoom = 11, api_key = mykey)

  ggmap(myMap) + 
    geom_point(aes(x=Lon, y=Lat), alpha=.03, size=.5, data=uber) + 
    labs(title = 'Location of Uber pickups on May 1, 2014 for NYC Destinations', 
    caption = 'source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city') + 
    xlab('') + ylab('') +
    theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())

Principles for Point Data

  1. Include useful background for appropriate context: there are several approaches for acquiring maps in R. Sometimes streets may be more useful, but in other situation a terrain image might be more relevant.
  2. With a point patterns, use transparency or heat map summaries to distinguish between areas of higher and lower intensity.
  3. Include useful titles, labels, and where appropriate, captions (all figures). These figures should stand alone.
  4. Sources should be cited in figures.

Cloning a Repo

Luckily, for all of us, this course will not consist of me talking for 75 minutes. Rather, active learning components will be interspersed within the day. In general, we will spend some time talking about a topic and then I will give you time to work through data visualization or analysis.

To facilitate the active learning sessions, I'd recommend cloning the repo at the start of class. For instance, here is the repo for today https://github.com/Stat534/Lecture3. Using R Studio to create a local project is one way to do this, but you won't have push access to my repo.

Active Learning Exercise: Seattle Police Calls Data Viz

seattle <- read_csv('./SeattlePolice.csv')
## Parsed with column specification:
## cols(
##   CAD.Event.Number = col_double(),
##   Event.Clearance.Description = col_character(),
##   Event.Clearance.SubGroup = col_character(),
##   Event.Clearance.Group = col_character(),
##   Census.Tract = col_double(),
##   Longitude = col_double(),
##   Latitude = col_double(),
##   Year = col_integer(),
##   Month = col_integer(),
##   Day = col_integer()
## )

Cartography

Distance Calculations

A collaborator suggests that there may a spatial relationship between the police calls in the Seattle Data Set. How would you calculate the distance between those points?

Additional References